智能论文笔记

Rethinking Generalization in Few-Shot Classification

Markus Hiller , Rongkai Ma , Mehrtash Harandi , Tom Drummond

分类：计算机视觉

2022-06-15

单图级注释仅正确地描述了图像内容的通常很小的子集，尤其是在描绘了复杂的现实世界场景时。尽管这在许多分类方案中可能是可以接受的，但对于培训时间和测试时间之间有很大差异的应用程序，它构成了一个重大挑战。在本文中，我们仔细研究了$ \ textit {少数图解} $的含义。将输入样品分成贴片并通过视觉变压器的帮助来编码它们，使我们能够在图像跨图像和独立于其各自类别的局部区域之间建立语义对应关系。然后，最有用的补丁程序嵌入手头的任务是通过推理时通过在线优化设置的支持的函数，此外还提供了图像中“ $ \ textit {最重要的} $”的视觉解释性。我们基于通过掩盖图像建模对网络进行无监督培训的最新进展，以克服缺乏细粒度的标签，并了解数据的更一般统计结构，同时避免使用负面图像级注释影响，$ \ textit {aka} $监督坍塌。实验结果表明，我们的方法的竞争力，在四个流行的少数几个分类基准测试基准中获得了新的最先进的结果，价格为$ 5 $ - 弹跳和$ 1 $ $ - 景点。

translated by 谷歌翻译

Learning Instance and Task-Aware Dynamic Kernels for Few Shot Learning

Rongkai Ma , Pengfei Fang , Gil Avraham , Yan Zuo , Tom Drummond , Mehrtash Harandi

分类：计算机视觉

2021-12-07

学习和概括与少数样本（少量学习）的新概念仍然是对现实世界应用的重要挑战。实现少量学习的原则方法是实现一种可以快速适应给定任务的上下文的模型。已经显示动态网络能够有效地学习内容自适应参数，使其适用于几次学习。在本文中，我们建议将卷积网络的动态内核作为手掌的任务的函数学习，从而实现更快的泛化。为此，我们基于整个任务和每个样本获得我们的动态内核，并在每个单独的频道和位置进行进一步调节机制。这导致动态内核，同时考虑可用的微型信息。我们经验证明，我们的模型在几次拍摄分类和检测任务上提高了性能，实现了几种基线模型的切实改进。这包括最先进的结果，以4次拍摄分类基准：迷你想象，分层 - 想象成，幼崽和FC100以及少量检测数据集的竞争结果：Coco-Pascal-VOC。

translated by 谷歌翻译

Adaptive Poincaré Point to Set Distance for Few-Shot Classification

Rongkai Ma , Pengfei Fang , Tom Drummond , Mehrtash Harandi

分类：计算机视觉 | 机器学习

2021-12-03

从有限的例子中学习和推广，我，e，几次拍摄的学习，对许多真实世界视觉应用的核心重要性是核心重要性。实现少量学习的主要方法是实现来自不同类别的样本是独特的嵌入的嵌入。最近的研究表明，通过双曲线几何嵌入较低的分层和结构化数据，使其适合几次拍摄的学习。在本文中，我们建议学习上下文知识的双曲标准，以表征与学习集合的点与设置距离相关联的点之间的距离。为此，我们将度量标准作为双曲线空间的切线束上的加权总和，并制定自适应地并基于点的星座获得重量的机制。这不仅使得公制本地，而且依赖于手头的任务，这意味着度量根据它比较的样本。我们经验证明，这种度量在异常值存在下产生鲁棒性，并实现基线模型的切实改善。这包括五个流行的少量分类基准，即迷你想象，分层 - 想象成，CALTECH-UCSD鸟-200-2011（幼崽），CIFAR-FS和FC100的最先进的结果。

translated by 谷歌翻译

Learning Online for Unified Segmentation and Tracking Models

Tianyu Zhu , Rongkai Ma , Mehrtash Harandi , Tom Drummond

分类：计算机视觉 | 人工智能 | 机器学习

2021-11-12

跟踪需要为推理阶段构建目标的判别模型。实现这一目标的有效方法是在线学习，可以舒适地占据截肢培训的型号。最近的研究表明，由于其像素级别歧视，视觉跟踪从统一视觉跟踪和分割的统一中受益匪浅。但是，对这种统一模型进行在线学习产生巨大挑战。分段模型不能轻易地从视觉跟踪方案中给出的先前信息学习。在本文中，我们提出了TrackM1P：一种新的元学习方法，优化了仅从部分信息学习以解决强加的挑战。我们的模型能够广泛利用有限的事先信息，因此具有比其他在线学习方法更强大的目标 - 背景辨别性。凭经验，我们表明我们的模型在竞争模型上实现了最先进的性能和切实改善。我们的模式实现了VOT2019，VOT2018，VOT2018和VOT2016数据集的66.0％，67.1％，68.5％的平均重叠增长了6.4％，7.3％，高于我们基线的6.4％。代码将公开可用。

translated by 谷歌翻译

Learning to Solve Multiple-TSP with Time Window and Rejections via Deep Reinforcement Learning

Rongkai Zhang , Cong Zhang , Zhiguang Cao , Wen Song , Puay Siew Tan , Jie Zhang , Bihan Wen , Justin Dauwels

分类：机器学习 | 人工智能

2022-09-13

我们建议一个基于深入强化学习的经理工作框架，以解决旅行推销员问题（TSP）的艰难而又非平凡的变体，\ ie〜有时间窗口和拒绝（MTSPTWR）的多车辆TSP（MTSPTWR），在此之前无法服务的客户截止日期将受到拒绝。特别是，在拟议的框架中，经理代理人通过基于图形同构网络（GIN）的策略网络将客户分配给每辆车，从而将MTSPTWR分为子路由任务。工人代理人通过根据每辆车的旅行长度和拒绝率来最大程度地降低成本来解决子路由任务，然后将其最多的最大值送回经理代理以学习更好的任务。实验结果表明，所提出的框架在更高的解决方案质量和较短的计算时间方面优于强基础。更重要的是，训练有素的代理商还取得了竞争性能，以解决看不见的较大实例。

translated by 谷歌翻译

REPNP: Plug-and-Play with Deep Reinforcement Learning Prior for Robust Image Restoration

Chong Wang , Rongkai Zhang , Saiprasad Ravishankar , Bihan Wen

分类：计算机视觉

2022-07-25

基于预训练的深层模型的图像恢复方案由于解决各种反问题的独特灵活性，因此受到了极大的关注。尤其是，插件播放（PNP）框架是一种流行而强大的工具，可以将现成的深层Denoiser集成，以与已知的观察模型一起，以用于不同的图像恢复任务。但是，在实践中，获得与实际情况完全匹配的观察模型可能具有挑战性。因此，带有常规深地位者的PNP方案可能无法在某些现实世界图像恢复任务中产生令人满意的结果。我们认为，通过使用经过确定性优化训练的现成的深层DENOISER，PNP框架的鲁棒性在很大程度上受到限制。为此，我们提出了一种新颖的深钢筋学习（DRL），以称为Repnp的PNP框架，通过利用基于轻巧的DRL的DENOISER来制定可靠的图像恢复任务。实验结果表明，所提出的REPNP对与实际情况的PNP方案中使用的观察模型具有鲁棒性。因此，RepNP可以为图像脱张和超级分辨率任务生成更可靠的恢复结果。与几个最先进的深层图像恢复基线相比，RepNP可以通过更少的模型参数实现更好的模型偏差的结果。

translated by 谷歌翻译

A Survey On Few-shot Knowledge Graph Completion with Structural and Commonsense Knowledge

Haodi Ma , Daisy Zhe Wang

分类：自然语言处理 | 人工智能 | 机器学习

2023-01-03

Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.

translated by 谷歌翻译

I2F: A Unified Image-to-Feature Approach for Domain Adaptive Semantic Segmentation

Haoyu Ma , Xiangru Lin , Yizhou Yu

分类：计算机视觉

2023-01-03

Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.

translated by 谷歌翻译

KoopmanLab: A PyTorch module of Koopman neural operator family for solving partial differential equations

Wei Xiong , Muyuan Ma , Pei Sun , Yang Tian

分类：机器学习

2023-01-03

Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.

translated by 谷歌翻译

StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles

Yifeng Ma , Suzhen Wang , Zhipeng Hu , Changjie Fan , Tangjie Lv , Yu Ding , Zhidong Deng , Xin Yu

分类：计算机视觉

2023-01-03

Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.

translated by 谷歌翻译